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In geostatistics, it is common to model spatially distributed phe- 
nomena through an underlying stationary and isotropic spatial pro- 
cess. However, these assumptions are often untenable in practice be- 
cause of the influence of local effects in the correlation structure. 
Therefore, it has been of prolonged interest in the literature to pro- 
vide flexible and effective ways to model nonstationarity in the spatial 
effects. Arguably, due to the local nature of the problem, we might 
envision that the correlation structure would be highly dependent on 
local characteristics of the domain of study, namely, the latitude, lon- 
gitude and altitude of the observation sites, as well as other locally 
deflned covariate information. In this work, we provide a flexible and 
computationally feasible way for allowing the correlation structure 
of the underlying processes to depend on local covariate informa- 
tion. We discuss the properties of the induced covariance functions 
and methods to assess its dependence on local covariate information. 
The proposed method is used to analyze daily ozone in the southeast 
United States. 

1. Introduction. The advance of technology has allowed for the storage 
and analysis of complex data sets. In particular, environmental phenomena 
are usually observed at fixed locations over a region of interest at several 
time points. The literature on modeling spatiotemporal processes has been 
experiencing a significant growth in the recent years. The main objective 
of this research is to define flexible and realistic spatiotemporal covariance 
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structures, since predictions for unobserved locations and future time points, 
and the corresponding prediction error variances, are highly dependent on 
the covariance structure of the process. An important challenge is to specify 
a flexible covariance structure, while retaining model simplicity. 

In this paper we are concerned with modeling ozone levels observed in 
the southeast USA. We explore models for ozone which allow the covari- 
ance structure to be nonseparable and nonstationary. Many spatiotemporal 
models have been proposed for ambient ozone data for various purposes. 
Guttorp, Meiring and Sampson (1994) and Meiring, Guttorp and Sampson 
(1998) generate predictions using independent spatial deformation models 
for each time period to evaluate deterministic models. Carroll et al. (1997) 
combine ozone predictions with population data to calculate exposure in- 
dices. Huerta, Sanso and Stroud (2004) and Dou, Le and Zidek (2010) use 
a dynamic linear model to perform short-term forecasting over a small re- 
gion, while Sahu, Gelfand and Holland (2007) use a dynamic linear model to 
predict temporal summaries of ozone and examine meteorologically- adjusted 
trends over space. Gilleland and Nychka (2005) seek a method for drawing 
attainment boundaries. McMillan et al. (2005) present a mixture model 
that allows heavy ozone production and normal regimes; the probability 
of each depends on atmospheric pressure. Berrocal, Gelfand and Holland 
(2010) combine deterministic model output with observations via a compu- 
tationally efficient hierarchical Bayesian approach. Nail, Hughes-Oliver and 
Monahan (2010) explicitly model ozone chemistry and transport with ad- 
ditional goals of decomposition into global background, local creation and 
regional transport components, and of long-term prediction under hypothet- 
ical emission controls. 

A challenging aspect of modeling ozone is its complex relationship with 
meteorology. Tropospheric ozone is a secondary pollutant in that it is not di- 
rectly emitted from cars, power plants, etc. Instead, it is formed from photo- 
chemical reactions of precursors nitrogen oxides (NOx) , and volatile organic 
compounds (VOCs), which are primary pollutants. The reactions that form 
ozone are driven by sunlight, so that ambient concentrations are highest in 
hot and sunny conditions, and ozone, NOx and VOCs are transported on 
the wind, so that emissions at one site affect ozone at another. It is therefore 
natural to wonder whether meteorological variables affect not only the mean 
concentration, but also its variance and spatiotemporal correlation. Of the 
studies mentioned, Guttorp, Meiring and Sampson (1994), Meiring, Guttorp 
and Sampson (1998), Huang and Hsu (2004) and Nail, Hughes-Oliver and 
Monahan (2010) model the dependence of the covariance on covariates in 
some form. Guttorp, Meiring and Sampson (1994) and Meiring, Guttorp 
and Sampson (1998) allow the spatial covariance to vary by hour of the day, 
while Nail, Hughes-Oliver and Monahan (2010) allow it to vary by season. 
Huang and Hsu (2004) allow the covariance to vary as a function of wind 
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speed and direction, and Nail, Hughes-Oliver and Monalian (2010) model 
the transport of ozone using wind speed and direction. 

We present a class of spatiotemporal covariance functions that allow the 
meteorological covariates to affect the covariance function [Schmidt, Gut- 
torp and O'Hagan (2011), Schmidt and Rodriguez (2011)]. This produces 
a nonstationary covariance, since the correlation between pairs of points 
separated by the same distance may be different depending on local me- 
teorological conditions. Sampson and Guttorp (1992) were among the first 
to propose a nonstationary spatial covariance function by making use of 
a latent space wherein stationarity holds. Schmidt and O'Hagan (2003) 
proposed a Bayesian model using the idea of the latent space where in- 
ference is performed under a single framework. Higdon, Swall and Kern 
(1999) use a moving average convolution approach based on the fact that 
any Gaussian process can be represented as a convolution between a kernel 
and a white noise process; nonstationarity results from allowing the kernel 
to vary smoothly across locations. Fuentes (2002), instead, assumed that 
the spatial process is a convolution between a fixed kernel and independent 
Gaussian processes whose parameters are allowed to vary across locations. 
Paciorek and Schervish (2006) generalize the kernel convolution approach 
of Higdon, Swall and Kern (1999). On the other hand, Cressie and Huang 
(1999), Gneiting (2002) and Stein (2005) present examples of nonseparable 
stationary covariance functions for space-time processes. Although these 
models provide flexible covariance structures, they usually have many pa- 
rameters, which may be challenging to estimate. 

Cooley, Nychka and Naveau (2007) capture nonstationarity using co- 
variates (but not geographic coordinates) to model extreme precipitation. 
Schmidt, Guttorp and O'Hagan (2011) extended the work of Schmidt and 
O'Hagan (2003) by allowing both geographic coordinates and covariates to 
define the axis of the latent space. They also provide a particular case of 
the general model which has a simpler structure but is still able to capture 
nonstationarity. Schmidt and Rodriguez (2011) apply this simpler version of 
the model in the case of multivariate counts observed across the shores of 
a lake. 

In this paper we provide a more flexible covariance model that allows not 
only the distance between covariates, but also the covariate values them- 
selves to affect the spatial covariance. For example, the spatial covariance is 
allowed to be different for a pair of observations with the same temperature 
on a cold day than for a pair of observations with the same temperature on 
a warm day. Following Fuentes (2002), we model the spatial process at lo- 
cation s, n{s), as a linear combination of stationary processes with different 
covariances. 
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where Wj{s) are the weights and 6j are independent zero-mean Gaussian 
processes with covariance Kj. Fuentes (2002) models the weights as kernel 
functions of space centered at predefined knots (pj, so that Kj represents 
the local covariance for sites near (pj. In contrast, we specify the weights 
in terms of spatial covariates, so that Kj represents the covariance under 
environmental conditions described by the covariates. 

The paper proceeds as follows. Section 2 introduces the model and Sec- 
tion 3 discusses its properties. Model-fitting issues and computational details 
are discussed in Sections 4 and 5, respectively. We analyze ozone data in Sec- 
tion 6. We find that the spatial correlation is stronger on windy days, and 
that temporal correlation depends on temperature and cloud cover. Section 7 
concludes. 

2. Covariate-dependent covariance functions. Let y{s,t) be the obser- 
vation taken at spatial location s G TZ^ and time t & TZ. The response is 
modeled as a function of p covariates x(s,t) = [xi{s,t), . . . ,Xp{s,t)]'^ , where 
xi{s,t) = 1 for the intercept. We assume that 

(2) y(s, t) = x(s, tf(3 + 6{s) + ^(s, t) + e(s, t), 

where /3 is the p-vector of regression coefficients, (5 is a Gaussian process 
to capture the overall spatial trend remaining after accounting for x(s,t)-^/3 

[Stein and Fang (1997)], /u(s,t) is a spatiotemporal effect, and e(s,t) ^'^^ 
N(0, cr^) is pure error. 

The spatiotemporal process fx is taken to be a Gaussian process with 
mean zero and covariance that may depend on (perhaps a subset of) the 
covariates, x. As described in Section 1, we model linear combination 

of stationary processes, 

M 

(3) n{s,t) = '^Wj[x{s,t)]ej{s,t), 

where 6j are independent Gaussian processes with mean zero and covari- 
ance Kj and ?i;j[x(s, t)] is the weight on process j. The motivation for this 
model is that different environmental conditions, described by the covariates, 
may favor different covariance functions. The weight Wj[x{s,t)] determines 
the spatiotemporal locations where the covariance function Kj is the most 
relevant. 

Integrating over the latent processes 9j, the covariance becomes 

M 

(4) Cov[^(s, t),fi{s', t') |x] = Wj [x(s, t)\wj [x(s', t')]Kj (s - s', t - t')- 

i=i 

With M = 1, only the variance of the process depends on the covariates, and 
the correlation, Ki{s — s',t — t')/ Ki{0,0), is stationary. With M > 1, both 
the variance and the correlation depend on the covariates. 
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(a) x{s) = s"^ (b) a;{s) = siii(47rs) 



Fig. 1. Covanance functions for a one- dimensional spatial process with M — 2, 
logit(w2(s)) = 3;(s), wi{s) = 1 — if2(s), Ki{s — s') = exp( — |s — s'|/0.02), and 
K2{s - s') = exp(-|s - s'l/0-50). 

As an illustration of the flexible spatial patterns allowed by our specifica- 
tion, Figure 1 plots the spatial covariance for two simple examples. In both 
cases we assume a one-dimensional spatial grid with s €TZ, a single covari- 
ate x{s), and that the spatial correlation is high in areas with large x{s). 
Both examples have M = 2, logit(w2(s)) = x(s), = 1 — W2{s), Ki{s — 

s') = exp(-|s - s'l/0.02), and K2{s - s') = exp(-|s - s'|/0.50). Figure 1 
shows the covariance for x{s) = and x{s) = sin(47rs). For the quadratic 
covariate, the second term has higher spatial correlation and the weight on 
the second process is high for locations with large therefore, the spatial 
correlation is stronger for s near —1 and 1 where x{s) is high. The spatial 
covariance is not a monotonic function of spatial distance for the periodic 
covariate. This may be reasonable if, say, x{s) is elevation and a site with 
high elevation shares more common features with other high-elevation sites 
than nearby low-elevation sites. 

There is confounding in (4) between the scale of the weights Wj and covari- 
ances Kj , since multiplying the weights by the constant c > and dividing 
the standard deviation of Kj by c gives the same covariance. Therefore, for 
identification purposes we restrict the squared weights for each observation 
to sum to one, X^^li u^j[x(s, f)]^ = 1. Also, allowing the weights to be neg- 
ative would result in a negative spatiotemporal covariance if i(;j[x(s,t)] > 
and Wj\x.{s' ,t')] < 0. In some situations this may be desirable, however, we 
elect to restrict Wj[x{s,t)] > to ensure a positive spatiotemporal covari- 
ance. Section 4 discusses weight selection in further detail. 

An important consequence of the covariance construction in (3) is that 
values of the process at two sites are uncorrelated unless at least one of the M 
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weight functions is positive at both sites. Therefore, unhke other nonstation- 
ary covariance models [e.g., Sampson and Guttorp (1992) and Higdon, Swall 
and Kern (1999)], it may be difficult to separate strength of dependence from 
severity of nonstationarity. For example, if M = 2, wi{si) = 'W2{s2) = 1, and 
^^2(^1) = Wi{s2) = 0, then not only is the covariance near Si different than 
the covariance near S2, but /i(si) and /i(s2) are necessarily uncorrelated. 
If this is deemed undesirable for a particular application, one alternative 
would be to allow for dependence between the latent 9j using a multivariate 
spatial model. Another option would be to use only covariates in the weights 
that have larger spatial range (perhaps pre-smoothed covariates) than the 
latent 0j processes, in which case this scenario is less likely. Section 3.2 pro- 
vides further discussion about the relative roles of the spatial range of the 
latent and covariate processes. 

This covariance model has interesting connections with other commonly 
used spatial models. For example, if we consider purely spatial data, as men- 
tioned in Section 1, taking the weights to be kernel functions of the spatial lo- 
cation alone, that is, i(;j[x(s)] = Wj{s), gives the nonstationary spatial model 
of Fuentes (2002). By modeling the weights as functions of the covariates, it 
may be possible to explain nonstationarity with fewer terms, giving a more 
concise and interpretable model. Also, with M = p and 7Uj[x(s)] = Xj{s) 
for j = 1, . . . ,p, we obtain the spatially- varying coefficient model of Gelfand 
et al. (2003). In this model 6j{s) represents the effect of the jth covariate 
at location s. The motivation for the spatially-varying coefficients model 
is to study local effects of covariates on the mean response. In contrast, 
our objective is to model the covariance. For example, in a situation with 
p = 20 covariates it may be sufficient to describe the spatial covariance using 
M = 2 stationary processes where conditions that favor the two covariance 
functions are described by weights wi and W2 that depend on all p covariates. 
Therefore, to provide an adequate description of the covariance, we assume 
the weights are random functions of unknown parameters that describe en- 
vironmental conditions (see Section 4) rather than taking the weights to be 
the covariates themselves. Finally, setting the weights wj to be constant in 
time and the latent processes 9j to be constant over space gives the spa- 
tial dynamic factor model of Lopes, Salazar and Gamerman (2008). Our 
model differs from this approach since our weights (loadings) are functions 
of spatial covariates rather than purely stochastic spatial processes. 

3. Properties of the covariance model. In this section we discuss some 
properties of the proposed model in (3) and the spatiotemporal covariance 
function. For example, it is clear that even if the individual covariances Kj 
are separable, stationary and isotropic, the resulting covariance (4) is in 
general nonseparable, nonstationary and anisotropic. Below we discuss other 
properties of the covariance model. 
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3.1. Monotonicity of the spatial covariance function. As shown in Fig- 
ure 1, the covariance function can be a nonmonotonic function of spatial 
distance, even if the underlying covariances Kj are decreasing. Intuitively, 
this occurs only if the spatial range of the covariates is small relative to the 
spatial range of the covariance functions Kj . More formally, assuming s^TZ 
and the Wj and each component of x are differentiable, then for any /i > 

(9Cov(^(s),/i(s + /i)|x) 

Wj{^[s\)wj{ii.[s + h])Kj{h) ^ ' ' 



Wj{-K[s + h]) Kj{h) 



both w'j and K'j are derivatives with respect to h. Therefore, if the weights 
Wj(x.[s]) and covariance Kj{h) are positive, a sufficient but not necessary 
condition for a monotonic covariance is that Wj{x.[s + h])/wj{x.[s + h]) + 
K'j{h)/Kj{h) < for all j. The ratios w'j{x[s])/wj{x[s]) and -K'j{h)/Kj{h) 
can be interpreted as the elasticity of the weight function (which depends 
on both the weight function itself and the derivative of the covariate pro- 
cess) and covariance function, respectively. This condition makes the initial 
statement more precise, in that (5) is negative if the elasticity of the weight 
function is less than the elasticity of the spatial covariance. 

In the special case of a powered-exponential covariance model Kj{s,s + 
h) = r?exp(— pj/i'^-') and exponential weights Wj{'x.) = exp(x-^Qj), where Q.j 
is a vector of coefficients, (5) becomes 

9Cov(;u(s),//(s -I- /i)|x) 

M 

i=i 

where Ax{s + h) denotes the vector of derivatives of 2;(s -|- h) with respect 
to h. The covariance is decreasing in h if Aa;(s -|- h)oLj < KjPjh'^^^^ for all j 
and h. This shows that it is possible to allow the spatial covariance to depend 
on covariates but retain monotonicity by restricting the parameters cxj, Kj 
and pj based on bounds on the covariate process derivatives. 

3.2. Smoothness properties of the spatial process. The smoothness prop- 
erties of a Gaussian process are often quantified in terms of the mean squared 
continuity of its derivatives. For many spatial processes, including the non- 
stationary model of Fuentes (2002), the smoothness of their process realiza- 
tions is well studied [see Banerjee and Gelfand (2003), Banerjee, Gelfand and 
Sirmans (2003)]. However, our model postulates a more general dependence 
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of the covariance on spatial covariates. Hence, in this section we explore the 
effect of that dependence on the smoothness properties of the realizations. 
For notational convenience, we assume a one-dimensional spatial process 
with s gTZ; the results naturally extend to more general direction deriva- 
tives by taking s = u^s for any unit vector u. We start by assuming the 
covariates x are fixed; this assumption will be later relaxed. 

Following the arguments of Banerjee and Gelfand (2003), we say that 
the kth derivative (with respect to s) of the process ^ (if it exists) is mean 
square continuous at s if 

(7) lim E[^('=)(s) - /iW(s + h)\xf = 0. 

For A; = 0, we can substitute (3) in (7) and get 
limE[^(s) -/x(s + /i)|xl^ 

M 

(8) =J2}imK,iO){wMs + h)]-wMs)]f 

M 

+ Y,l^m2wj[^{s)]wj[^{s + h)]{Kj{0)-K,{h)), 

which shows that /i is mean square continuous if each latent process is 
mean square continuous [liuih^Q Kj{h) = Kj{0)] and the weights are smooth 
enough to satisfy lim/i_^o('^i i^i'^ + ^)] ~ '^j i^i-^)])'^ — example, 
they are continuous functions of the continuous spatial covariates. 

In some settings, it may be reasonable to consider x to be a random 
process. We extend the discussion of Banerjee and Gelfand (2003) to the 
case when the weights are functions of stochastic covariates. In this case, 
to study the smoothness of /i requires considering variability in both the 
latent Oj as well as the covariates x. The covariates enter the covariance 
model only through the stochastic weights Wj{s) = Wj\x.{s)]. Taking the 
expectation with respect to both 9j and Wj[s) gives 

limE[/x(s) -/i(s + /i)]^ 

M 

M 

+ 2^ lim(i^j(0) - Kj{h))^w,W3{s)Wj{s + h)]. 
i=i 

Therefore, under stochastic covariates, the process /x is mean square continu- 
ous if and only if the latent processes 6j and the weight processes Wj are both 
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mean square continuous. It is well known from probability theory that the 
weight function Wj is mean square continuous, for example, if it is bounded 
and the covariate processes are almost surely continuous. Mean square conti- 
nuity also follows when Wj is Lipschitz continuous of order 1 and the covari- 
ate processes are mean square continuous. For example, the logistic weights 
Wj{'x.) = exp(x-^Q;j)/[l -|- exp(x-^Q!j)] are both bounded and Lipschitz con- 
tinuous of order 1, whereas exponential weights u'j(x) = exp(x"^Q;j) are not. 

These results naturally extend from mean squared continuity to mean 
square differentiability, and higher order derivatives. Since //(s) is the sum 

of stochastic processes Zj{s) = Wj{s)dj{s), then fi^^\s) = X^^i ■^j'^^(s)- In 
particular, for k = 1 the derivative process at s is 

(10) /.W(.) = J2ef\s)Wj{s) + e,{s)w^'\s). 

So the process fi is mean square differentiable if both Wj{s) and 6j{s) are 
mean square differentiable. Conditions analogous to those outlined above 
for mean square continuity will assure that the weights are mean square 
differentiable. More precisely, if the covariate processes xi{s), . . . ,Xp{s) are 
mean square differentiable and the function Wj{-) is Lipschitz continuous of 
order 1, then the resulting process Wj{s) is mean square differentiable, and 
so is n{s). 

One could go further to study sample path properties and almost sure 
continuity of the induced spatial process, although the required proofs are 
generally more difficult than the proofs of mean square properties. If both the 
weight functions and latent processes are almost surely continuous, then the 
induced spatial process is also almost surely continuous. Adler (1981) and 
Kent (1989) provide conditions to verify almost sure continuity for spatial 
fields. 

3.3. Span of the covariance function. The covariance in (4) is quite flex- 
ible. For example, consider partitioning the covariate space in N subsets 
^1, . . . ,An and 

N 

Wj[:s.{s,t)] =^aji/(x(s,t) £Ai). 

1=1 

When x(s,t) G Ai and x(s',t') G A/^, the covariance becomes 

M 

Cov(^(s, t),ij{s', t') |x) = ^ ajittjkKjis -s',t- t'). 

i=i 

Hence, each covariance Kj{s — s',t — t') contributes to the mixture differ- 
ently according to the levels of the covariates. Setting some of the weights 
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Uij = allows Kj to contribute only to the covariance of terms with specific 
combinations of covariates, for example, both observations have low wind 
speed and high cloud cover. Also, by setting some of the aij < 0, it is possible 
to specify negative correlation for observations with different levels of the 
covariate. By increasing M and N, this argument shows how the covariate- 
dependent weights can be used to describe quite general spatiotemporal 
behavior depending on the covariates. 

4. Priors and model-fitting. In this section we describe a convenient 
specification of the model. For notational convenience, we assume that at 
each time point observations are taken at spatial locations si, . . . , s^v G 
and that t € {1,2,...}. The overall spatial trend 5 is a Gaussian process 
with mean zero and spatial covariance Kq. We assume that 5's covariance 
is stationary, although one could allow 5's covariance to be nonstationary 
as well. We assume an autoregressive spatiotemporal model for the latent 
processes 6j, 

(11) 9j{s,t) = 7j%(s, t-l)+ ej{s,t), 

where 7j G (0, 1) controls the temporal correlation and the ejt = [ej{si,t), . . . , 
ej{s]\f,t)] are independent (over j and t) spatial processes with mean zero 
and spatial covariance Kj. We use exponential covariance functions for Kj, 
j = 0, . . . , M. We note that although this is a relatively simple specifica- 
tion for the temporal component for each latent process, complex temporal 
covariance structures can emerge from this mixture model. The covariance 
between subsequent observations at a site is a mixture of M autoregressive 
covariances that varies with space and time according to the covariates. This 
approach could be very useful for modeling hourly ozone which is generally 
low and steady at night, and high and volatile in the day, which could be fit 
by including hour of the day as a covariate in the weights. 

As mentioned in Section 2, there is confounding in (4) between the scale 
of the weights Wj and covariances Kj. Therefore, for identification pur- 
poses we restrict the squared weights for each observation to sum to one, 
'^jLiWj[x.{s,t)]'^ = 1. Although there are other possibilities, we assume the 
weights have the multinomial logistic form 



(12) wMs,t)f 



exp (x(s, t)'^cx.jj 
Y^iii^W (x(s,t)^Q;0 



where cti, . . . , ccj// are vectors of regression coefficients that control the ef- 
fects of the covariates on the covariance. For these weights setting M = 1 
gives wi[x(s,t)] = 1 and the model is stationary with covariance Ki. The 
choice of logistic weights also ensures mean square continuity of the pro- 
cess realizations, as outlined in Section 3. For identification purposes, we fix 
cxi = 0, as is customary in logistic regression. 
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The priors for the hyperparameters are uninformative. We use N(0, 10^) 
priors for the elements of /3 and aj . The covariance parameters have priors 

CT-2,r-2 '-^ ■ Gamma(0.1,0.1), and ~ Unif(0, 1). Also, we take K^{\\hs\\) = 
exp(— where hs is the distance between points after a Mercator 
projection, scaled to correspond roughly to distance in kilometers, and pj ~ 
Unif (0,2,000). 

The covariance and the effect of an individual covariate on the covari- 
ance in (4) are rather obscure. This is due to the label-switching problem, 
that is, the labels of the processes are arbitrary: for example, 9i may corre- 
spond to a high variance process for some MCMC iterations and to a small 
variance process for others, making inference on individual parameters diffi- 
cult. One remedy for the label-switching problem is to introduce constraints, 
perhaps, Var(0i) < • • • < Var(0A./). However, ordering constraints on complex 
functions such as spatiotemporal covariance functions is not straightforward. 
Therefore, rather than summarizing the individual parameters in the model, 
we summarize the entire covariance function for different combinations of 
covariates. A simple way to summarize the effect of the kth. covariate is in 
terms of the posterior of the ratio of the covariance of two observations with 
Xfc = 2 (standard deviation units above the mean) compared to the covari- 
ance of two observations with = 0, assuming all other covariates are fixed 
at zero (their mean). That is, 

. , , Eili((exp(aji + ajfc))/(X;^iexp(an + a/fc)))i^j(/is,/it) 
(13) Ak{hs,ht) = ^ — ^^aTTT } WTJ^i 



E7=i((exp(aji))/(X;;^i exp{aii)))Kj{hs, h 



where ajk is the A:th element of cxj and Kj{hs, ht) = J<|(||/is||)7j''''. We also 
inspect the ratio of correlations Ak{hs,ht) = Af:{hs,ht)/Ak{0,0). We con- 
sider a covariate to have a significant effect on the variance if the posterior 
interval for Afc(0,0) excludes one. Similarly, we consider a covariate to have 
a significant effect on the spatial (temporal) correlation if the posterior in- 
terval for Afc(/is,0) [Afc(0, /it)] excludes one. 

Finally, we discuss how to select the number of terms, M. One approach 
would be to model M as unknown and average over model space using 
reversible jump MCMC. Lopes, Salazar and Gamerman (2008), Salazar, 
Lopes and Gamerman (2011) use reversible jump MCMC to select the num- 
ber of factors in a latent spatial factor model. However, this approach is 
likely to pose computational challenges for large spatiotemporal data sets. 
Therefore, we select the number of terms using cross-validation and as- 
sume M is fixed in the final analysis. For cross-validation, we randomly 
(across space and time) split the data into training {n = 63,881) and test- 
ing (A^ = 3,367) sets. We fit the model on the training data and compute 
the posterior predictive distribution for each test set observation. We then 
compute the mean squared error MSE = X^j(^i — /N and mean abso- 
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lute deviation MAD = Yli(Xi - Yif/^i where the sum is over the test 
set observations, Yi is the posterior mean, and Yi is the posterior median. 
We also compute the mean (over the test set observations) of the posterior 
predictive variances ("AVE VAR"), the median of the posterior predictive 
standard deviations ( "MED SD" ) and the coverage probability of 95% pre- 
diction intervals. 

5. Computational details. We implement the model in R (http://www. 
r-project.org/). Though implementation in WinBUGS (http://www.mrc- 
bsu.cam.ac.uk/bugs/) would also be straightforward, run times might be 
long for large data sets. We update Ojt = [9j{si,t), . . . ,6j{s]\j,t)], (3, 
and 7j, which have conjugate full conditionals, via Gibbs sampling, and we 
update ttjfc, pj and Vj using Metropolis-Hastings sampling with a Gaussian 
candidate distribution tuned to give acceptance probability around 0.4. 

Sampling using the dynamic spatial model in (11) allows us to update 
the Qjt as a block and avoid inverting large matrices. The alternative of 
sampling after marginalizing out the latent Ojt would require computing the 
entire spatiotemporal covariance with elements given by (4), which would 
likely give better mixing for small to moderate data sets. For our large data 
set, however, matrix computations of this size are not feasible. 

We monitor convergence with trace and autocorrelation plots for several 
representative parameters. Monitoring convergence is challenging for this 
model since the labels of the latent terms may switch during MCMC sam- 
pling: exchanging ai, v\ and 71, for example, with 0:2, P2^ 1^2 and 72, 
does not affect the covariance in (4). Rather than monitoring convergence 
for these parameters individually, we therefore monitor convergence of the 
covariance (4) at several lags and of the spatiotemporal effect fJ-{s,t) for 
several spatiotemporal locations. For the application in Section 6 we gen- 
erate 20,000 samples, discarding the first 10,000 as burn-in. For the expo- 
nential covariances considered here this appears to be sufficient; however, 
for smoother processes, such as those induced by the squared exponential 
covariance, 20,000 iterations may not be sufficient. 

6. Application to southeastern US daily ozone. To illustrate our spa- 
tiotemporal covariance model, we analyze ozone in the southeast US. The 
primary National Ambient Air Quality Standard (NAAQS) for ozone re- 
quires the three-year average of the annual fourth-highest daily maximum 
8-hour daily average concentration to fall beneath 75 parts per billion (ppb) 
[CFR (2008), pages 16436-16514]. Our response variable is thus the square 
root — to ensure Gaussianity — of the daily "8-hour ozone" metric. We fo- 
cus on the 89 sites in North Carolina, South Carolina and Georgia shown 
in Figure 2. This geographically heterogeneous region transitions from the 
flat, low-altitude coastal plains in the east, to the gentle, rolling hills of the 
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(a) Average ozone (ppb) (b) Trace plots of ozone 



Fig. 2. Plots of square root ozone (pbb). Panel (a) plots the average for each station (the 
stations are marked with points) and panel (b) gives trace plots for each station in August 
2005 (Day 1 is August 1, 2005). 

piedmont, to mountains in the northwest, with a handful of urban islands 
buffered by suburbs that give way to rural tracts. Since summertime ozone 
concentrations are highest, and therefore most relevant for attainment deter- 
mination, we extract daily 8-hour ozone concentrations, longitude, latitude, 
elevation and site classification (urban, suburban or rural) for June- August, 
1997-2005 (6444/73,692 = 8.7% missing) from the US EPA Air Quality 
System (AQS) database, available via the Air Explorer web tool (http:// 
www.epa.gov/airexplorer/index.htm). 

We obtain daily average temperature and daily maximum wind speed 
from the National Climatic Data Center (NCDC) Global Summary of the 
Day and daily average cloud cover from the NCDC National Solar Radiation 
database. Since meteorological and ozone data are not observed at the same 
locations, for each day we predict each meteorological variable at ozone ob- 
servation sites using spatial Kriging implemented in SAS V9.1 Proc MIXED 
with an exponential covariance function. Though Li, Tang and Lin (2009) 
show that ignoring uncertainty when using spatial predictions of covariates is 
not without consequence, accounting for that uncertainty is nontrivial. Since 
our current focus is the development of the covariate-dependent covariance 
model, we treat these predictions as fixed. 

Covariates x(s, t) in the mean trend include the continuous variables tem- 
perature, wind speed, cloud cover, elevation, longitude, latitude and a linear 
trend in year, each standardized to have mean zero and variance one, and we 
include two indicator variables identifying a station as urban or rural, leav- 
ing suburban as the baseline. We have no detailed land-use covariates as in 
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Paciorek et al. (2009), however, which would hkely improve fine-scale predic- 
tion. We consider all two-way interactions between the three meteorological 
variables and quadratic effects of the meteorological variables. The covari- 
ance is modeled as a function of only the main effects of these covariates. 

6.1. Empirical variogram analysis. We begin studying the data by an- 
alyzing the spatial variogram, defined as ^{h) = E{[r{s,t) — r(s -|- /iu,t)]^), 
where r{s,t) is the residual after accounting for the mean trend and u is 
a unit vector. Though there is evidence that regression coefficient estima- 
tion can be affected by disregarding spatial correlation [Reich, Hodges and 
Zadnik (2006), Wakefield (2007), Paciorek (2010)], for simphcity we use ordi- 
nary least squares, pooled over all observations, to estimate the mean trend. 
We estimate the variogram as the mean squared difference between all pairs 
of observations in a bin Dh, that is. 



where is the set of pairs of points on the same day with ||s — s'|| € 
{h — e,h + e) and \Dh\ is the cardinality of D^. 

To explore the effects of each of the covariates on the spatial covariance, 
we compute individual variograms for three categories of site pairings. In 
the "low-low" category, both sites have values of the covariate below the 
sample median for the covariate; in the "high-high" category, both have 
values above the median; and in the "low-high" category, one has a value 
below, and the other above, the median. Such variograms for cloud cover 
and wind speed are given in Figures 3 and 4, respectively. 

In Figure 3(a), the variogram is lowest for pairs of observations for which 
both sites have high cloud cover, higher when both sites have low cloud 
cover, and highest when one site has low and the other high cloud cover. Solar 
radiation is required to turn NO2 into ozone or to create VOC's that turn NO 
into NO2. Therefore, under high cloud cover conditions, ozone levels would 
be expected to drop close to background levels (a long-term equilibrium that 
would exist in the absence of local emissions), which would be homogeneous 
over a region of this size. Two sites for which cloud cover is low would be 
expected to be less similar to each other than would two sites that both 
have high cloud cover because the production of ozone via solar radiation is 
now dependent on the spatially-varying precursors. For example, areas very 
close to major sources of NOx (power plants and urban centers on workdays) 
would have low ozone due to NOx scavenging, and moving downwind from 
these sources, ozone would increase and then decrease. Finally, based on the 
explanation above, it is clear that if one site has high cloud cover, so that 
ozone production is minimal, and the other has low cloud cover, so that 
ozone production is rampant, they would have very dissimilar ozone values, 
so that the variogram would be highest for the low-high category. 
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200 400 600 200 400 60C 

Spatial distance (l^m) Spatial distance (km) 

(a) Variogram (b) Relative variogram 




200 400 600 200 400 

Spatial distance (km) Spatial distance (km) 



(c) Variogram of log ozone (d) Variogram of daily standardized 

residuals 

Fig. 3. Sample variograms for the ozone data by cloud cover. The data are plotted 
separately for pairs of observations with both ("Low-Low"), one ("Low-High") and nei- 
ther ("High-High") members of the pair with cloud cover below the median cloud cover. 
Panel (b) plots the ratio of curves in (a), panel (c) uses log-transformed, rather than 
square-root-transformed data, and panel (d) standardizes the residuals by the daily sample 
standard deviation. 



Wind speed does not generally affect the chemical reactions that create 
or destroy ozone, but it does transport ozone and its precursors. One would 
expect that within smaller subregions with higher wind speeds, distance is 
effectively shortened so that spatial correlation would be higher, and two 
sites in the "high-high" category would have lower variogram, followed by 
those with "low-high," then "low-low," as we see in Figures 4(a) and 4(b) 
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Fig. 4. Sample variograms for the ozone data by wind speed. The data are plotted sep- 
arately for pairs of observations with both ("Low-Low"), one ("Low-High") and neither 
( "High-High" ) members of the pair with wind speed below the median wind speed. Panel (b) 
plots the ratio of curves m (a), panel (c) uses log-transformed, rather than square-root- 
transformed data, and panel (d) standardizes the residuals by the daily sample standard 
deviation. 



for spatial lags below 250 km. The ordering of the categories is reversed for 
larger spatial lags, where transport is less relevant. 

In addition to computing these variograms for our square root ozone re- 
sponse, we compute the variograms using residuals from a regression on the 
log, rather than square root, of ozone, and the variogram of standardized 
residuals, that is, r*(s,t) =r(s,t)/sj, where st is the sample standard devi- 
ation of the residuals for day t. The variograms are affected more by the log 
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Table 1 

Validation set results. The summaries are mean squared error (MSE), median absolute 
deviation (MAD), mean posterior predictive variance (AVE VAR), median posterior 
predictive standard deviation (MED SD) and coverage probability of 95% intervals 
(GOV). All values are multiplied by 100 



M 





1 


2 


3 


4 


5 


MSE 


18.9 


18.6 


18.3 


18.2 


17.9 


MAD 


23.5 


22.7 


22.2 


22.0 


21.4 


AVE VAR 


18.3 


17.0 


16.7 


16.4 


16.7 


MED SD 


41.9 


40.0 


39.2 


38.8 


38.2 


GOV 


95.7 


95.5 


95.3 


95.3 


95.2 



transformation than by standardizing. The same general patterns remain 
after standardizing, but new ones emerge after a log transformation. For 
example, the ordering of the variograms for cloudy and sunny days switches 
after a log transformation in Figure 3. The patterns of the log-transformed 
responses also indicate covariate-dependent covariance, so it appears that 
the transformation is important, but does not resolve nonstationarity. 

6.2. Results. We fit five versions of the model, with the number of mix- 
ture components varying from M = 1 to 5. We withheld 5% of the observa- 
tions (3,687 observations), selected randomly across space and time. Table 1 
compares for predictions of square root ozone for this validation set. For all 
models, the prediction intervals have coverage greater than 0.95. The five- 
component model minimizes all measures of prediction error and variance. 
The ratio of mean squared error for the five-component model to that of 
the stationary one-component model is 0.179/0.189 = 0.947, and the corre- 
sponding ratio of average prediction variances is 0.167/0.183 = 0.913. The 
nonstationary covariance thus gives a modest improvement in prediction ac- 
curacy and uncertainty quantification. We also tried higher values of M and 
found slight improvements in prediction, but elected to proceed with M = 5 
for model simplicity. 

The largest effect of nonstationary is in the measures of prediction uncer- 
tainty. Figure 5 plots the prediction standard deviation for the observations 
in the validation set for the stationary model with M = 1 and the non- 
stationary model with M = 5. The standard deviation is smaller for the 
nonstationary model for 72% of the observations, and varies far more across 
observations for the nonstationary model (roughly from 0.25 to 0.80) com- 
pared to the stationary model (roughly 0.35 to 0.65). To show that the con- 
ditional coverage remains valid for both models, we separated the validation 
set into five equally sized groups based on the ratio of standard deviations 
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Prsdtctlon SD, Ms1 



Fig. 5. Posterior predictive standard deviation for the observations in the validation set 
for the stationary model with M — 1 compared to the nonstationary model with AI — 5. 



from the models with M = 5 to M = 1. The coverage of 95% intervals in 
these five groups (from smallest to largest relative variance) is 0.97, 0.97, 
0.96, 0.96 and 0.93 for the stationary model and 0.93, 0.96, 0.94, 0.97 and 
0.96 for the nonstationary model. 

Table 2 and Figure 6 summarize the covariate effects on the mean and 
spatiotemporal correlation for the full data set with M = 5. The mean trend 
accounts for most of the variability in square root ozone: though the sample 
variance of the observations is 1.61 ppb, the posterior means of the spatial 
effects 6{s) have variance 0.09 ppb. The statistical significance of the linear 
and quadratic temperature terms and the positive effect of temperature on 
variance are consistent with findings of Nail, Hughes-Oliver and Monahan 
(2010) and others that ozone concentration is a monotone increasing nonlin- 
ear function of temperature, and ozone variance increases with the mean. It 
is reasonable that spatial correlation decreases as temperature increases due 
to the fact that when the solar radiation is conducive to the chemical reac- 
tions that produce ozone, that production is a function of local emissions, 
and highly nonlinear in NOx emissions, which vary over space. Similarly, 
it is reasonable that spatial correlation at short spatial lags increases with 
wind speed because wind facilitates transport of ozone and its precursors. 

As discussed in Section 6.1, the relationship between cloud cover and 
ozone is quite complex. We find that cloud cover is negatively associated 
with the mean and temporal correlation, and positively associated with vari- 
ance and spatial correlation. As expected, mean ozone decreases and spatial 
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Table 2 

Summary of the model with M — 5 components. The remaining columns give the 
posterior means (95% intervals) for the mean effects jSk, the relative variance (Afc(0, 0)), 

the relative spatial correlation at lag 100 km (Afc(100,0)), and the relative temporal 
correlation at lag 2 days (Afc(0,2)). 13^, Afe(0,0), Afc(100,0), and Afc(0,2) are scaled to 
represent the effect of a two standard deviation increase in the predictor 







Mean 


Variance 


Spatial cor. 


Temporal cor. 






/3fe 


Ate(0,0) 


Afc(100,0) 


Afe(0,2) 


Temperature (F) 


0.333 


(0.331,0.358) 


1.09 (1.06,1.12) 


0.88 (0.86,0.90) 


1.09 (1.03,1.16) 


Wind speed (m/s) 


-0.028 


(-0.037,-0.019) 


0.96 (0.94,0.97) 


1.05 (1.04,1.06) 


0.97 (0.94, 1.00) 


Cloud cover {%) 


-0.154 


(-0.173,-0.134) 


1.12 (1.07,1.17) 


1.05 (1.03,1.06) 


0.57 (0.51,0.64) 


Elevation (ft) 


0.115 


(0.078,0.183) 


0.98 (0.96,1.01) 


1.10 (1.09,1.11) 


1.30 (1.24,1.37) 


Urban 


0.007 


(-0.020,0.035) 


1.00 (0.98,1.02) 


0.95 (0.93,0.96) 


0.99 (0.96,1.02) 


Rural 


0.045 


(0.010,0.066) 


0.57 (0.43,0.77) 


0.94 (0.90,0.97) 


1.17 (1.02,1.37) 


Year 


0.004 


(-0.001,0.008) 


1.01 (1.00,1.02) 


1.03 (1.02,1.03) 


0.95 (0.93,0.97) 


Longitude 


0.096 


(0.018,0.187) 


0.99 (0.94,1.03) 


1.12 (1.10,1.13) 


1.53 (1.43,1.62) 


Latitude 


0.185 


(0.089,0.251) 


0.70 (0.67,0.74) 


1.05 (1.03,1.07) 


0.48 (0.40,0.55) 


Temp2 


0.023 


(0.014,0.032) 








WS2 


0.004 


(0.002,0.005) 








CC2 


-0.020 


(-0.029,-0.010) 








Temp X WS 


-0.005 


(-0.013,0.003) 








Temp X CC 


0.055 


(0.044,0.068) 








WS X CC 


0.004 


(-0.003,0.012) 









correlation increases with cloud cover since ozone levels drop near low, het- 
erogenous background levels in the absence of solar radiation. A possible 
explanation for low variance and high temporal autocorrelation for sunny 
days is the common southeastern summertime meteorological regime called 
the "Bermuda high," which is characterized by sunny skies and high at- 
mospheric pressure indicative of a lower atmospheric boundary layer. The 
lowered ceiling combined with low wind speed effectively reduce the volume 
in which emissions interact, which, combined with high solar radiation, cre- 
ates a simmering cauldron of ozone production. Because the Bermuda high 
persists over several days and spans regions greater than or equal to the 
size of our spatial domain, ozone production is high everywhere, so that the 
variability is lower and the temporal correlation is higher. 

Figure 6 plots the estimated spatial and temporal covariance for several 
combinations of the covariates. Figures 6(a) and 6(b) show that the esti- 
mated spatial correlation is lower for spatial lags less than 100 km for hot 
days, and that temperature is less relevant at larger distances. This plot 
also shows the mixture of exponential correlation functions gives a corre- 
lation that is significantly different than a simple exponential correlation. 
The mixture correlation function drops more quickly near the origin and 
has a heavier tail than an exponential correlation. Cloud cover also affects 
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Fig. 6. Posterior mean (thick lines) and 95% intervals (thin lines) of the spatiotemporal 
correlation (4) for various combinations of the covariates. "Baseline" assumes that all 
covariates are zero (the mean after standardization) for both observations. The other plots 
assume that all covariates are zero with the exception of one covariate, which equals two 
standard deviation units above the mean. Panel (b) plots the posterior of ratio of the 
spatial correlations under high temperature and baseline conditions plotted in panel (a). 
The spatial correlation is plotted as a function of spatial distance hs with temporal distance 
ht = 0, and vice versa. 



both the spatial covariance and temporal autocorrelation. Figure 6(c) shows 
that the variance is higher on cloudy days, but the covariance has smaller 
spatial range. Also, the temporal correlation in Figure 6(d) is higher for lags 
one, two and three for sunny days. 

Figure 7 compares the posterior mean of the stationary one-component 
model to that of the nonstationary five-component model, and shows the 
relationship between the spatial covariance of the latter model with temper- 
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(a) Elevation (ft) 




(e) Temperature (F) on July 27, 2005 



Fig. 7. Data and spatial correlation esti 
nonstationary (M — 5) models. Panels (1 
correlation between the point marked with 




' — 0,0 

(b) Correlation, M = 1 




' 1- 0.0 

(f) Correlation on July 27, 2005, M = 5 



lates for two days for stationary (M — 1) and 
, (d) and (f) plot the posterior mean of the 
dot and the remaining sites. 
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ature and elevation. Figure 7(b) shows the exponential decay in correlation 
with increasing distance from the marked site for the stationary model; this 
correlation function is the same for the two days under consideration, June 7, 
1997, and July 27, 2005, which have the minimum and maximum tempera- 
tures at the marked site. The temperature contours for those days are plotted 
in Figures 7(c) and 7(e), and elevation contours are plotted in Figure 7(a). 
The spatial correlation contours in the northwest of Figure 7(d) show the 
negative effect of elevation on spatial correlation. The July 27, 2005 position 
of the maximum temperature peak over the marked site clearly shows the 
effect of temperature on the steepness of the decline in correlation at short 
versus long lags seen earlier in Figure 6(a). The effect of elevation on corre- 
lation is dwarfed by the effect of temperature, likely due to the positioning 
of the temperature peak over the marked site combined with the magnitude 
of the temperature at that peak. 

7. Discussion. In this paper we present a class of spatiotemporal covari- 
ance functions that allows the covariance to depend on environmental condi- 
tions described by known covariates. Although fitting this, and other sophis- 
ticated spatiotemporal models, likely requires expertise in spatial statistics 
and computing methods, the method produces interpretable summaries of 
the effect of each covariate on the mean, variance, and spatial and temporal 
ranges. For the southeastern US ozone data, we find our nonstationary anal- 
ysis improves prediction error, reduces prediction variance, and achieves the 
desired coverage probabilities, while identifying several interesting covariate 
effects on both the mean and covariance. 

Our covariance model assumes that all nonstationarity can be explained 
by the spatial covariates. However, in some cases a more flexible model would 
be useful. One approach would be to add more pure functions of space and 
time as covariates in the covariance to capture nonstationarity. An even more 
flexible model would take the weights to be Gaussian processes, possibly 
with means that depend on the covariates, to allow the weights to vary 
smoothly through the spatial domain while still making use of the covariate 
information. 
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